Kotlin Sequence 真的如此不堪吗?
大家吼哇,今天吃了吗?吃的什么?前段时间(2025年02月28日)Kotlin官方公众号发布了他们二月份的技术月报: 《Kotlin 技术月报 | 2025 年 2 月》, 其中有一篇被提及的文章引起了我的注意: 应该使用 Kotlin Sequences 来提高性能吗?(Should you use Kotlin Sequences for Performance?)
原文内容
为什么这篇文章会引起我的好奇呢?因为这篇文章得到的结论非常的 “反直觉”。
从标题不难看出,这篇文章探讨的内容是 Kotlin 的 Sequence 和有关它的性能问题。有趣的是,这篇文章得出的最终结论是:
当一组数据数据量越大、中间操作越多时,使用 Sequence 进行操作的效率就 越低。
借原文的代码举个例子,假如有如下代码:
object Db {
fun getItems(): List<DbModel>
}
fun getItemsList(): List<UiModel> {
return Db.getItems()
// 下面的操作视为“中间操作”
.filter { it.isEnabled }
.map { UiModel(...) }
}
fun getItemsListUsingSequence(): List<UiModel> {
return Db.getItems()
.asSequence()
// 下面的操作视为“中间操作”
.filter { it.isEnabled }
.map { UiModel(...) }
.toList()
}
按照原文的结论,那么假如 Db.getItems 得到的 List 中的元素越多,那么 Sequence 的效率越低;
假如 中间操作 越多,Sequence 的效率越低。
对此结论的部分原文内容摘抄:
Benchmark Results: Sequences can be slow
Here’s the results on my MacBook Pro M1, running on Temurin JDK 21:
Test Operations per second List 1,636,222 Sequence 1,491,436 Flow 1,192,928 So here’s me eating humble pie: using a sequence for simple chained operations is about 9% slower than not.
So I went ahead and tweaked each function to be more extreme, and perform a bunch more filtering and mapping. I count 7 intermediary collections created in this example, but the Flow and Sequence versions should still be creating zero. With this in mind, I expected the sequence version to pull ahead…
fun getItemsList(): List<UiModel> {
return Db.getItems()
.filter { it.isEnabled }
.map { UiModel(it.id) }
.filter { true }
.map { UiModel(it.id) }
.filter { true }
.map { UiModel(it.id) }
.filter { true }
.map { UiModel(it.id) }
}
Test Operations per second List 663,391 Sequence 364,947 Flow 671,243 Lessons Learnt
- Sequences can be slower due to per-element function call overhead. I’d go as far to say that they are nearly always slower today. The more complex your operation, the higher the cost.
- Flows can optimize some chained operations better than expected, but don’t use them for that. Use them for their asynchronousity.
- Collections are often the best fastest choice for performance.
Eating humble pie 🥧
Apologies for the many times I’ve asked my coworkers in code reviews to use
asSequenceto improve performance....
Update 2: Large data set
As there were few people commenting to the effect of “that list is too small”, I re-ran the benchmark using 100,000 items (instead of 100). The differences grew…
Test Operations per second List 623 Sequence 245 Flow 792 ImmutableArray 757
提出质疑
如果你比较了解 Sequence 的话,应该知道它是一个惰性的迭代类型,你可以把它近似地当成 Java 中的 Stream。
Along with collections, the Kotlin standard library contains another type – sequences (
Sequence). Unlike collections, sequences don't contain elements, they produce them while iterating. Sequences offer the same functions as Iterable but implement another approach to multi-step collection processing.
也正因此,原文的结论就令人难以信服。假如在各个方面 —— 尤其是对大量数据进行复杂地中间操作 —— Sequence 的效率远不及 List,
那么 Sequence 岂不是除了降低内存使用以外就毫无用处了?那 IDEA 更没有理由会在这类情况下主动提示你使用 asSequence 来优化代码效率了。
除此之外,还有很多奇怪的地方。比如在 100,000 个元素的情况下,Flow 的效率竟然比 List 操作还要高 21%,甚至是4个基准中最快的那个。
带着一丝怀疑,我决定亲自试一试。
原基准测试
好消息是,原作者非常贴心地提供了他所使用的基准测试代码: https://gist.github.com/chrisbanes/ac842abd90b5810a1fe1ee05c7f0ac30
当代码拷贝完成后,我首先感觉到了这代码中似乎有一些不妥之处。不过这先按下不表,先遵照原教旨主义跑一遍试试。 而测试结果也的确不出我所料:
Benchmark (size) Mode Cnt Score Error Units
OriginalBenchmark.flow N/A thrpt 10 1197880.963 ± 14766.372 ops/s
OriginalBenchmark.list N/A thrpt 10 1344318.180 ± 38601.373 ops/s
OriginalBenchmark.sequence N/A thrpt 10 1572128.363 ± 10080.851 ops/s
| Test | ops/s |
|---|---|
| flow | 1,197,880 |
| list | 1,344,318 |
| sequence | 1,572,128 |
根据我本地的基准测试情况,在 100 个元素的情况下,最终的效率情况是 Sequence (+17%) > List (+12%) > Flow,
也就是 Sequence 比 List 效率高大约 17%,而 List 比 Flow 效率高约 12%,
可以说与原博的结论正好相反。